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Search ID: nfkn338 


What a moose hears 
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Vocabulary of Machine Learning 


¢General Terms 

¢Broad Classes - Al, ML, Deep ML, Generative ML 
¢ML Map and Associated Vocabulary 

¢Neural Networks Map and Associated Vocabulary 


Learning vs Regular Computation 
o-— pam Classical 

qo a © 

oS io: —@O 
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“Learning” - Operational Definition 


“Learning” - Operational Definition 


66 A computer program is said to 
learn from experience E with 
respect to some class of tasks T 
and performance measure P, if 
its performance at tasks in T, as 
measured by P, improves with 


experience E. 


~Tom Mitchell 


(on Machine Learning's Operational Definition) 


Carnegie Mellon University 


Machine Learning 
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“Learning” - Operational Definition 
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“Learning” - Operational Definition 


Leams from rail ade experience 


Performs task 


Learns more and 
further tweaks 


model 
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“Learning” - Minimal Mathematical 
Formulation 


Assuming [hoping] that there exists an unknown functional relationship 
(mapping) between two “classes” of “entities” 


i Y 
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“Learning” - Minimal Mathematical 
Formulation 


Assuming [hoping] that there exists an unknown functional relationship 
(mapping) between two “classes” of “entities” 


Af ix wy 


—* 


i Y 
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“Learning” - Minimal Mathematical 
Formulation 


Assuming [hoping] that there exists an unknown functional relationship 
(mapping) between two “classes” of “entities” 


Af ix wy 


—* 


i Y 


And, assuming that there is a class of candidate functions H (Hypotheses), the goal is to choose 
(“learn”) the candidate function h € H which will “satisfactorily” replicate this mapping. 
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“Learning” - Minimal Mathematical 
Formulation 


Assuming [hoping] that there exists an unknown functional relationship 
(mapping) between two “classes” of “entities” 


= WY dace eee 


ay 


ha: xX 7~y 


H 


And, assuming that there is a class of candidate functions H (Hypotheses), the goal is to choose 
(“learn”) the candidate function h € H which will “satisfactorily” replicate this mapping 
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“Learning” - Minimal Mathematical 
Formulation 


Assuming [hoping] that there exists an unknown functional relationship 
(mapping) between two “classes” of “entities” 


xX "xX oy 


ay 


And, assuming that there is a class of candidate functions H (Hypotheses), the goal is to choose 
(“learn”) the candidate function h € H which will “satisfactorily” replicate this mapping. 
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“Learning” - Minimal Mathematical 
Formulation 


Assuming [hoping] that there exists an unknown functional relationship 
(mapping) between two “classes” of “entities” 


{a', y" i—1 Cx xX Y 
xX “XX 3 Yy 


ay 


Samples of Known mappings 
(input-output pairs) may or may 
not be available. 


And, assuming that there is a class of candidate functions H (Hypotheses), the goal is to choose 
(“learn”) the candidate function h € H which will “satisfactorily” replicate this mapping. 
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“Learning” - Minimal Mathematical 
Formulation 


Assuming [hoping] that there exists an unknown functional relationship 
(mapping) between two “classes” of “entities” 


{a', y" i-1 Cx xX Y 
xX “XX 3 Yy 


ay 


Samples of known mappings 
(input-output pairs) may or may 
not be available. 


And, assuming that there is a class of candidate functions H (Hypotheses), the goal is to choose 
(“learn”) the candidate function h € H which will “satisfactorily” replicate this mapping. 
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“Learning” - Mathematically 


Machine Learning 


——= —— Ss SSS SS 
Final Hypothesis 
g.X—Y 


| Target Function 
Learning Algorithm 
a*f 


(Unknown) 
A 


| —iXAY 
geEH 


eect. 
-- 


Hypothesis Space 
H: {hy, ho, ..., hyb 


a 7 Sample Set 
ue (Transformed Data) 
D: {(x,, Y1) (Xp, Yo); oo (Xns Yu), 27 


Optional 
(may not be 
available). 
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“Learning’ 


In a more 
general 
setting, we 


also recognize 


that 


The 
“mapping” 
may be 
probabilisti 
c in nature. 


We will 
need some 
“error 
measure” 
to choose 
“best” 
candidata_ 


- Mathematically 


UNKNOWN TARGET DISTRIBUTION 
Pty: | 
target function f X—Y plus noise 


PROBABILITY 


DISTRIBUTION 


on X 
Ei . ee 
TRAINING EXAMPLES 1 N . 
JOG Yd Ow hd) ERROR | 
. — MEASURE 


LEARNING FINAL 
areceimia| | EOmESS 


g: X— 28 


HYPOTHESIS SET 
H 
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“Data 


Data 
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Data is what 
ML works 
with. 
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Data is what 
ML works 
with. 


Numerical 


Made of numbers 
Age, weight, number of 
children, shoe size 


Continuous Discrete 
Infinite options Finite options 
Age, weight, blood Shoe size, number of 
pressure children 
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Categorical 


Made of words 
Eye colour, gender, blood type, 
ethnicity 


Ordinal Nominal 
Data has a hierarchy Data has no hierarchy 


Pain severity, satisfaction Eye colour, dog breed, 


rating, mood blood type 
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“Featur 
a” 


“Label” 
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2. 


“Feat 
Ca U l Some characteristics of Input Data 
” (In many ML applications the input data itself is called 


eC “features”). 


Build Model 
F(X1, X2)=Y 


\ AL Ke 
=> __. peg —_-|2_?|_~} 
Predict 


New Data Use Model 


T | | fften another name for Output Data 
La be (True labels may or may not be available for training) 
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rT 
Featu l Some characteristics of Input Data 


”" (In many ML applications the input data itself is called 
eC “features”). 
Features Labels 
———————_ta———__ -—*— 
"| ‘dale 
F(X1, X2)=¥ | | 3 | 1.5 | 78321 | 


3 3 98712 


atin Predict 


ES X1, X2 
( 


New Data Use Model 


\____,___J 


Column 


T | | fften another name for Output Data 
La be (True labels may or may not be available for training) 
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“Trainin 
g Data” 
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dd | | 
Tra | a | a Pairs of known Inputs (features) and their Outputs 


g Data ” (labels). 


If *x oy 


— 


m4 


{x y' i] C X xX y 


Samples of known mappings 
(input-output pairs) may or may 
not be available. 


Known mappings (input-output 
pairs). 
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“Model” 
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A structure and corresponding interpretation that 


“Model” 
O e summarizes or partially summarizes a set of data, 
for description or prediction. 


i ™ 


ae aS ae 
cy Ah XL 
é oo UA \ 2s cosy " 
2) : Observational Data Mathematical Model 
te Vl aa “= {oe mJy 
/ ‘ y a a 
: ' g(x))4x n UL hq = 31415 f 
Ax ‘ j <A Oo 
bpd 


fy € Ww (+x) ; 

Wisli+-| . a 
» n — —_ 

Vr 5 k=1 A. r 
* sin(x) 5 
. —F ’ 
oA i 4 
Wy = - 


x a 
00 << 
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“Model” 


9998 
9994 
.9986 
.9976 
9962 


9945 
9925 
9903 
.9877 
9848 
.9816 
9781 
.9744 
.9703 
9659 


.9613 
.9563 
9511 
9455 
.9397 


.9336 
9272 
9205 
9135 
.9063 


.8988 
8910 
.8829 
.8746 
.8660 


.8572 
.8480 
.8387 
.8290 
.8192 


.8090 
.7986 
.7880 


7979 


177 
.7660 


1547 
7431 
7314 
193 
1071 


A structure and corresponding interpretation that 
summarizes or partially summarizes a set of data, 
for description or prediction. 
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dd M Od e y? A structure and corresponding interpretation that 
summarizes or partially summarizes a set of data, 
for description or prediction. 


.9998 

9994 

pee cos(x 
9976 

9962 

9945 8988 
9925 ‘8910 
.9903 8829 
.9877 8746 
9848 .8660 
.9816 8572 
9781 .8480 
9744 8387 
.9703 8290 
9659 8192 
.9613 .8090 
9563 7986 
9511 .7880 
9455 Bs ie or 
.9397 .7660 
9336 7547 
‘9272 7431 
‘9205 7314 
9135 7193 
nee 7071 
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“Model” 


9998 
9994 
.9986 
.9976 
9962 


9945 
9925 
9903 
.9877 
9848 
.9816 
9781 
.9744 
.9703 
9659 


.9613 
9563 
9511 
9455 
.9397 


.9336 
9272 
9205 
9135 
.9063 


.8988 
8910 
.8829 
.8746 
.8660 


8572 
.8480 
.8387 
.8290 
.8192 


.8090 
.7986 
.7880 


799 


hea 
.7660 


1547 
7431 
7314 
193 
1071 


for description or arises 


cos(x) 


x4 
=1- 2-4 t 


A structure and corresponding interpretation that 
summarizes or partially summarizes a set of data, 
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“Model” 


9998 
9994 
.9986 
.9976 
9962 


9945 
9925 
9903 
9877 
9848 


.9816 
9781 
9744 
.9703 
9659 


.9613 
9563 
9511 
9455 
.9397 


.9336 
ete 
9205 
9135 
9063 


8988 
.8910 
8829 
8746 
.8660 


8572 
.8480 
8387 
.8290 
8192 


8090 
-7986 
.7880 
7771 
-7660 


7547 
7431 
7314 
£193 
7071 


A structure and corresponding interpretation that 
summarizes or partially summarizes a set of data, 


for description or prediction. 


cos(x) 
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“Model” 


9998 
9994 
.9986 
.9976 
9962 


9945 
9925 
.9903 
.9877 
9848 


.9816 
9781 
.9744 
.9703 
9659 


.9613 
9563 
9511 
9455 
.9397 


.9336 
9272 
9205 
9135 
.9063 


8988 
.8910 
8829 
8746 
.8660 


8572 
.8480 
8387 
.8290 
8192 


8090 
-7986 
.7880 
7771 
.7660 


7547 
7431 
7314 
£193 
7071 


A structure and corresponding interpretation that 
summarizes or partially summarizes a set of data, 


for description or prediction. 


cos(x) Se ee ee ee 


= Ya 


j=l 


jo(w; Tx + by), wh 
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“Algorith 
mM” 
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“Algorith 
mM” 
—— 
Anjalgorithm is a set of 


step-by-step instructions 


that describe how to wo. 
perform a task. 


ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 


34 


“Algorith 
Mm” 


—_7—O 
Anjalgorithm is a set of 
step-by-step instructions 
that describe how to 


f task Suppose you have to learn a and b 
perrorMm a taSK. 


iteratively with following rules 


- Weinitializea =1,b=1 
- After each iteration, either a or 


b can go up or down by 1, but 
not both. 

Update criteria is to minimize 

e = |Zqg — z|, where Zg= desired 
value, and z = obtained value. 
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Types Of Algorithms 


, Dynamic Divide And 
Brute Force Recursive p C 
Algorithm Algorithm ee — 
Algorithm Algorithm 


a % 


a s 


Greedy 


Backtracking 
Algorithm 


Algorithm 


Randomized 
Algorithm 
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“Brute Force . | 
: ystematically checks all 
Algorith mM” possible candidates against a 


given criteria (“exhaustive 
search”). 


“Brute Force . | 
; ystematically checks all 
Al Q O rith mM” possible candidates against a 


given criteria (“exhaustive 
search”). 


Homework: what 
could be the 
drawbacks [benefits]? 


Strategy: try all possible paths to find 
those with maximum sum. 
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e reed y Systematically makes the locally 


optimal choice at each step 


Al Q O rith aa sd a to long-term 


“ G reed Y Systematically makes the locally 


optimal choice at each step 


Al Q O rl th aa ed ae to long-term 


Homework: what 


could be the 
drawbacks [benefits]? 


62) 


Strategy: Make the locally best choice 
based on what is directly in front. 
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“ G reed Y Systematically makes the locally 


optimal choice at each step 


Al Q O rith M fe (without regard to long-term 


optimality). 


Initial Weight (wou) 


Gradient Descent, 
used heavily — in 
Neural Networks, is 


coh Learning rate (a) 


New Weight (wy.,) 


a greedy algorithm. 


Wnew ~ Wold — For 


Weight (W) 


Minimum point of cost function 


Strategy: Make the locally best choice 
based on what is directly in front. 


ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 


“r,t Recursively breaks down a problem 
Divi de a nd into two or more subproblems of the 


Conq Uuer Algorith Mm ’’ same or related type, until these 


become simple enough to be solved 
directly. 


“r,,: Recursively breaks down a problem 
Divide a nd into two or more subproblems of the 


Conq uer Algorith Mm ’’ same or related type, until these 


become simple enough to 
P g omework: what 


directly. 
” could be the 
AravwrhAacrlbe 'henefits |? 


Conquer @¢— 
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“Dynamic 
Programming” 


Repeatedly breaks down a problem 
into overlapping subproblems of 
similar type to save computations. 


rT : 
Dyn a mM | C Repeatedly breaks down a problem 
into overlapping subproblems of 


Prog Ke! mM mM | N Q " similar type to save comput 


Homework: what 


could be the 
drawbacks [benefits]? 


Dynamic Programming 


Paradigm 


Overlapping 
Subproblems 


Divide 


and 
Conquer 


Optimal 
Substructure 


Strategy: Make a decision at each step considering 
the current problem and solution to the previously 
solved problem to calculate the salen solution. 


ECCAQ) Nathamaticc far Machin laa NIaisann £ Diitt nan CIV 
ES691 - Mathematics for Machine Learning / Dr. Naveed R. B @ GIK 


fereny 
a 2 ’ ss | Brute Force 
f rae: . mm Algorithm would 
Oo. Oe my : 3@ . ; @ calculate all the 


+ rian Psa paths from A-J and 
‘ i P _—@ choose the 
3 


shortest. 
Ss 1 2 


AMin. 
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Xs 7 
/ ea, # 4 
2 Set y 7 | Brute Force 
Z a ty mm Algorithm would 
OOOO cateulate ate 


choose the 
Shortest. 
fiim.. i 2 Greedy 
nesiveteesedy Algorithm would 
Approach Ox: Se a tasesel @ choose the locally 
” Y we 4. 


7 . 
, ‘S 7 
4 
yi ce be - 
\ / - 
. ee 7” 
‘ Ae ‘ ae 
‘ 3s , rs s ° 
. ea! fe eS 4 
b' ‘ . ta ‘ t . . - = 
\ , <4 . / Soe ’ 
. : 
3 x, / aby 3 
s es sy ca 
\ : 
. - 
5 we 
‘ -- 
. sae 
i i 
. 
3 


best option and 


©. may miss the 
gf “3 


global best. 


Is there a 
pate sence better way? 
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E-H-J is shorter than E-l-J, irrespective of whether we come to E 
“4 11 

CART at Oe GOAL : 

evn ©... ould we save 

‘ea 4 computations by 
breaking problem 
into overlapping 
subproblems? 


= 
x 


ey 


= 
2 . 
rs . 
- 1 Se 

H . 

' 

1 

ee ee 

wa ' ~=— 
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E-H-J is shorter than E-l-J, irrespective of whether we come to E 
from_B, C;-er D!! 


START 


ey 


fH 
joe, econ Someta 

a ' re 
C 

mie ai 

; a 

fp Ph w 

phe N 

1 1 

' ae 

1 Seok 

H Bs 

H 

1 

' 

H 

1 

1 

H 
} 


vs 
‘ . 
Ns 7 cA gf 
’ .-- q ; 
zi bY 
Pd \ 
N 
/ * i 
; No . 
r, v y. 
J ”~ ON 
ie EOS 
, Ze Ae 
f 3 
, , 
’ ae N 
/ , 
_---i---- i ee --- Janna pyenmaacens 
4 N 
‘ . 
‘ 4 
. 
. 


uter science 
ul AM. 


1 \ 
U ne ies 
’ 4 
i s a 
‘ é 
/ PSS 7 
F . ye 
/ F 
ye ‘ 
os 


: . awe ee 
ee = 
“ 


ign ial 


- 
~ 


4 
. 
op 
‘ 
\ 


GOAL 


Could we save 
computations by 
breaking problem 
into overlapping 


subproblems? 
Checked 
unnecessaril 
y multiple 
times! 


Why not check the 
minimum for E-J, 
F-J, and G-J and 
use that instead? 
Why not repeat 
this abstraction 
even on second 
layer? 
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Overlapping 
min(AJ) = min(2 + min(B/J),4 + min(CJ), 3 + min(D/)) \2subproblems! 


min(B/J) = min(7 + min(E£/),4 + min(F/), 6 + min(G/)) 
min(E/J) = min(1 + HJ,4+/)) 


AAN 


uter science 


WAP. 


Strategy: Make a decision at each step 
considering the current problem and solution to 
the previously solved problem to calculate the 


optimal solution. 
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By Any Other 
Name... 


INPUT TERMS 


— TERMS 


e Responses 


Dependent 
variables 
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Relationship: Al, ML, and Deep 
Learning 


@ Artificial Intelligence 


Development of smart systems and machines that can carry 
out tasks that typically require human intelligence 


© Deep Learning 


Uses an artificial neural 
network to reach accurate 
conclusions without human 

intervention 


COMPUTER SOCIETY 


BZ 


What Else Does AI Cover (Other 


Artificial 
Intelligence 


Than ML)? 


Recurrent 
Convolutional 
Modular 
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What About Generative Al? 


Artificial Intelligence 
The theory and methods to build machines 
Expert System Al that think and act like humans. 
Programmers teach Al 
exactly how to solve specific 


problems by providing 
precise instructions and Machine Lea rning 
steps. The ability for computers to learn from 


experience or data without human programming. 


Generative Al 
Generates new text, 
audio, images, video or 
code based on content it 
has been pre-trained on. 


Pn o> 


tibia abascThn aiforeducation.io 
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Radial Basis Function 
Neural Networks (RBFNN)} 


Generative Adversarial 
Networks (GANSs} 


pai cs Modular Neural seqeseq 
Recurrent Neural Networks 
Networks (RNN) 


Multi Layer Perceptron 


DCNN 
Convolutional 


(MLP) —. Neural Networks 
Artificial Neural (CNN) 
Networks 


Q-Learning Deep Q-Network 
{DON} 


Random Forest 


Th - Machine Ensemble 
ML Ma e Learning Learning Learning 


A3C Genetic Algorithm  SARSA 


KNN Logistic = Naive 
Regression Bayes 


AdaBoost Boosting XGBoost 


GradientBoost CatBoost LightGBM 


Classical Eucat priori FP-Growth 


3 
Learning 


Decision SVM 
Tree 


Dimensionality 


Clustering Reduction and 


Visualization 
Linear Lasso and Ridge Fuzzy C-Means_ k Means BCA cok we 
Regression Regression DRSCAN Mear:Shift 
Polynomial tSNE QDA_ LSA 


Regression 


LLE 


2, 


Main Types of Machine Learning Systems 


Enough data When classical 
Defined features ML is not enough 
No data Complicated data 
But Rimes an Unclear features 
Belief in a miracle 


environment to 


Classical interact with Ensemble 
ML / Learning 


Reinforcement Artificial 
Learning Neural Nets 
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Classical Machine 
Learning 

aka 

Statistical 
Inference 


Radial Basis Function Generative Adversarial 
Neural Networks (RBFNN}) Networks (GANSs} 
a ii OM Modular Neural ie 
Recurrent Neural Networks 
Networks (RNN) 


DCNN 
Multi Layer Perceptron Convolutional 


(MLP) — Neural Networks 
Artificial Neural (CNN) 
Networks 


Random Forest 
Q-Learning Deep Q-Network 
DON 


ne Stacking 


Reinforcement Machine Ensemble 
Learning Learnino Learning 


A3C Genetic Algorithm —SARSA AdaBoost | Boosting] xGBoost 


GradientBoost CatBoost LightGBM 


KNN Logistic Naive 
Regression Bayes 


Classical Eucat Apriori FP-Growth 
Learning 
Decision SVM 


Tree 
Dimensionality 


Clustering Reduction and 


Visualization 


Linear Lasso and Ridge Fuzzy CMeans_k Means 
Regression Regression DBSCAN Mean-Shift PCA LDA SVD 
Polynomial tSNE QDA_ LSA 


Regression LLE 


a7 


Regressi 


Kind of Another Name _ for 
O N “Curve Fitting” 
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Regressi 


Kind of Another Name _ for 
O N “Curve Fitting” 
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Regressi 


O N “Curve Fitting” 


Kind of Another 


Obs _ hours score 


1 1 64 
2 2, 66 Simple Linear Regression Nonlinear 
3 4-76 i 
4 3 73 
5 5 74 
6 6 81 
7 6 83 
8 iff 82 
9 8 80 
10 10 88 
11 11 84 
12 3 82 


13 12 a 
14 12 93 
15 14 89 


Q. Why do we 
do it? 
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Regressl 
Kind of another name _ for Broader 


O N “Curve Fitting” Category: 
Supervised 
Learning 


C2 TERMS — TERMS 


ad a i | 
—,. — Traini 
Independent Dependent 
variables variables n g vy 


Data 


a 
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Regressi 
Kind of another name _ for 


O N “Curve Fitting” 


Multiple Linear Regression 


|_| study Hours Prep Exams [Final Exam Score. 
students | 3 | 2 | 5 
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Multiple 
independent 
variables. One 
dependent 
variable. 


62 


Reg ressi Sometimes we fit a probability curve toa 


“categorical” dependent variable, rather than a 
O N trulv numeric one. 


Hours 
(a 0:50: 6.75.) 1.00) 1.25) 1:50) 1.75:) 1:75) 2:00) | 225 2-50 | 2.75 |3.00' | 3.25:| 3.50) 4.00 | 4.25 | 4:50)):4.75 | 5.00 |5:56 
Xk. 
Pass 

0 0 0 0 1 0 1 0 4 0 1 0 1 1 1 1 1 1 
(Yi) 
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Sometimes we fit a probability curve toa 


Regressl 
“categorical” dependent variable, rather than a 


O N trulv numeric one. 


Hours 

fe 0:50: 6.75.) 1.00) 1.25) 1:50) 1.75) 1:75) 2:00) 225 2-50 | 2.75 3.00) 3.25 :| 3:50) 4.00 | 4.25 | 40560))-4.75 |'5.00)|:5:56 
Xk. 
Pass 
(Yi) 


Probability of passing exam versus hours of studying 


1.00 - e e e e e e e e e e 


Probability of passing exam 
° ° 
3 a 


© 
) 
a 


0.00 - e e e e e e e e e e 


Hours studying 


Classificat 


Kind of “finding delimiting 


O N curve[s]” 
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Kind of “finding delimiting Broader 
On curve[s]” Category: 


Supervised 
Learning 


Linear 
boundary 


- Training: Labelled data is provided to the system, and it 
finds class boundaries (“delimiting curve[s]”). 
Identification: New data is given to system and it labels it 
(i.e., Identifies which class it belongs to). 
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Kind of “finding delimiting Broader 
On curve[s]” Category: 


Supervised 
Learning 


~———____ Regression 


P Curve 
X2 


ee Classification 


_ Curve 
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Classificat 


Kind of “finding delimiting Logistic 

ON curve[s]™ (Categorical) 
Regression 
Curve 


Probability of passing exam versus hours of studying 


1.00 - e e e e e «eee e 


ro) 0 nn Kam ——— Pass Or Fail 


Student Profile 


) ° 
oa ~“N 
oO ol 
1 ' 


Probability of passing exam 


i] 

i) 

a 
' 


Student Profile 


3 
Hours studying 


Classification 
Curve 
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Classificat 


We are not limited to linear 
O N boundaries! 


ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 


69 


Classificat 
on 


Linearly separable 


A linear decision boundary that 
separates the two classes exists 


Linear 
boundary 
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We are not limited to linear 


boundaries! 


Not linearly separable 


No linear decision boundary that separates 
Nonlinear the two classes perfectly exists 
boundary 
~ 


~ 
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Classificat 


Neither are we limited to two 
O N dimensions! 
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Classificat 
on 


ES691 - 


Neither are we limited to two 
dimensions! 


kernel 
function 


input space feature space 


In fact, a nonlinearly separable 
problem in lower dimensions, 


could be linearly solvable in 
higher dimensions! 
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vz 


Classificat 


We are also not limited to just two 
ela classes! 
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Classificat 


We are also not limited to just two 
On classes! 


Binary classification Multi-class classification 


Classification 


———————> Plane 


Machine 
Leaming Model 


& Zoumana KEITA 
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Classificat 
on 


Multi-Label 
Classification 


We could even identify 


multiple labels within one 
input! 
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Classificat 
on 


Predictions 


Multi-Label 
Classification 


Boat 
Cat 
Plane 
Machine 
Leaming Model 


oe Zoumana KEITA 


We could even identify 


multiple labels within one 
input! 
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Classificat 
on 


One often has to decide what 
kind of losses 
(misclassifications) to accept. 


Finally, keep in mind that classification is 
not generally as simple as this! 
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Broader Category: 


C | U ste a ale “Grouping — without 


training!” 


Unsupervised 
Learning 
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CI U ste a ale | crouping without 


training!” 


Unsupervised 
Learning 


How many 


characters do 
you see? 


~ 

diff t 

engusde 3T WV QJ 
® a 


Even though | x 


you probably do 
not know any of 
these 
languages, you 
can attempt to 
answer that 


question. 
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Broader Category: 


79 


Clusterinc 


How many 
different 
language 
characters do 
you see? 


Even though 
you probably do 
not know any of 
these 
languages, you 
can attempt to 
answer that 
question. 


“Grouping without 
training!” 


Broader Category: 


Unsupervised 
Learning 


Without having been 
trained on the scripts, you 
can probably make out 
similarities and 
differences, and use these 
for potential grouping 
(“clustering”) 
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Broader Category: 


C | U ste a ale “Grouping — without 


training!” Unsupervised 


Learning 


Classification Clustering 


Supervised learning Unsupervised learning 
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Broader Category: 


CI U ste a ale | crouping without 


training!” 


Unsupervised 
Learning 


Output 


Algorithm 


Input Raw Data 


e.g., automatically group customers into 


different segments based on their purchasing 
behavior. 
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AS S O C a C Broader Category: 
N Links” Unsupervised 


Learning 


"93% of people who purchased item A 
also purchased item B" 
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AS S O C | a C | U “Discovering 


N Links” 


Broader Category: 
Unsupervised 
Learning 
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"93% of people who purchased item A 


Pa tt e mM “Discovering 
Rec Og N ITI O N Regutlarities in Data” 


Broader Category: 


Unsupervised 
Learning 


ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 


85 


Re C O Q N le O N Regularities in Data” Unsupervised 


Learning 


0112358 13 2134... 
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Re C O Q N le O N Regularities in Data” Unsupervised 


Learning 


Numbers 


1235813 2134... 
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Pattern 


| 
Recog N ITI O N Regutlarities in Data” 


Numbers 


Broader Category: 


Unsupervised 
Learning 


re) = 21 
l MONON) =k 
06419446010 
tng HE 
hendaonilong Ra 
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D | mM e N S | O N a | | “Removing data Broader Category: 


attributes that are of Unsupervised 


ty Red uct On least or no importance Learning 


to task at hand” 
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Data may 
contain 


unnecessary 
details! 


89 


D | mM e N S | O N a | | “Removing data Broader Category: 


- attributes that are of Unsupervised 
ty Reduction least or no importance Learning 


to task at hand” 


LinkedIn: I'm honored and thrilled to announce that | have 
been selected among the top 5 applicants who 
participated in the professional and most respected 
exam which evaluates the skill and ability to operate 
fuel-based vehicles. | cannot wait to see what the 
next chapter holds, and | cannot express my 
appreciation to the ministry of transportation, 
Google, NASA, and My neighbors who supported 
me during this challenging Journey. 


Reality: | got my Driving License 
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Data may 
contain 


unnecessary 
details! 
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D | mM e N S | O N a | | “Removing data Broader Category: 


- attributes that are of Unsupervised 
ty Red uctl On least or no importance Learning 
to task at hand” 


A sparse 
representation of 


data may be 
possible! 
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io 


Amplitud 


D mM e Nn S | O N a | | “Removing data Broader Category: 
ty Re d U ctl on attributes that are of Unsupervised 


least or no importance Learning 
to task at hand” 


Fourier 


Tim | 
e 


A sparse | Frequency? 
representation of 


data may be 
possible! 
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D | mM e N S | O N a | | “Removing data Broader Category: 


attributes that are of Unsupervised 


ty Red uct On least or no importance Learning 


to task at hand” 


May be possible 
to reduce 
dimensions! 
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D mM e Nn S | O N a | | “Removing data Broader Category: 
ty Re d U ctl on attributes that are of Unsupervised 


least or no importance Learning 
to task at hand” 


PC2 


— HP 


May be possible 


rl Most variation in data is in this direction 
a wt to reduce 
= dimensions! 
= 
S$ Dimensionality Jb 
Reduction 
LD 


Variable 1 


After readjusting our axes, data 


mainly changes along one 
dimension. 
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D | mM e N S | O N a | | “Removing data Broader Category: 


attributes that are of Unsupervised 


ty Red uct On least or no importance Learning 


to task at hand” 


May be possible 
to merge 
attributes! 
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D | mM e N S | O N a | | “Removing data Broader Category: 


attributes that are of Unsupervised 


ty Reduction least or no importance Learning 


to task at hand” 


May be possible 
to merge 
attributes! 


/Customer Customer Customer Twix Milky Way Skittles Dum Dums Blow Pops 


Age Country 


ememan [ee [us| 
Saomre [eos 
comers | [us 
eomwo |e [us 


Chocolat Lollipop 
es S 
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D | mM e N S | O N a | | “Removing data Broader Category: 


attributes that are of Unsupervised 


ty Red uct On least or no importance Learning 


to task at hand” 


Why would we 
want to reduce 
dimensions? 
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Dimensionall 
ty Reduction 


Why would we 


want to reduce 
dimensions? 


Curse of 
Dimensionality! 


“Removing data Broader Category: 
attributes that are of Unsupervised 


least or no importance Learning 
to task at hand” 


10 data points with 10 
attributes each give 100 


attributes. . 
10 data points with 5 


attributes each give 50 
ibutes. 
— Prilion data points with 10 
attributes each give 10 million 


attributes. 
Pa Million data points with 5 
— attributes each give 5 million 


Difference of 5 attributes. 


million! 
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That 
completes... 


Classical Machine 
Learning 

aka 

Statistical 
Inference 


Radial Basis Function Generative Adversarial 
Neural Networks (RBFNN}) Networks (GANSs} 
a ii OM Modular Neural ie 
Recurrent Neural Networks 
Networks (RNN) 


DCNN 


Multi Layer Perceptron Convolutional 
(MLP) — Neural Networks 
Artificial Neural (CNN) 
Networks 
Random Forest 


Reinforcement Machine Ensemble 
Learning Learnino Learning 
AdaBoost XGBoost 


GradientBoost CatBoost LightGBM 


Q-Learning Deep Q-Network 
{DON} 


A3C Genetic Algorithm  SARSA 


KNN Logistic Naive 


Regression Bayes E 
—— Classical Eucat Apriori FP-Growth 
Classification . 
Learni Ng Pattern Search 
— SVM 


Dimensionality 


Clustering Reduction and 


Visualization 


Linear Lasso and Ridge Fuzzy CMeans_k Means 
Regression Regression DBSCAN Mean-Shift PCA LDA SVD 
Polynomial tSNE QDA_ LSA 


Regression LLE 
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Questions?? Thoughts? ? 


5 


e 


”-~ — 
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